Analyzing and Visualizing WWII Aerial Bombing Data

By Justin McKinney and Justin Ashbaugh

what image shows

Introduction

World war II has been sealed in the history books as one of the most catastrophic, devastation-causing events in all of history. Resulting in billions of dollars worth of damages and casualties in the millions, there was no shortage of pure carnage to be seen. Though these numbers are clear and present, it is sometimes important to ponder what it takes for sheer destruction of this magnitude. The data set we analyze here houses a plethora of knowledge and data regarding the destructive capabilities of aircraft during WWII. It houses many important data points such as Aircraft type, Bomb type, target, bomb size, and so much more. It is important to note, however, that this data only represents targets and missions conducted by the allied forces, without the inclusion of Russia. This data is not representative of the axis powers' contribution to the world's destruction.

Outline

In this tutorial, our goal is to provide you with a more colloquial and readable understanding of the historical data collected during the war. The data provided in its raw form is incredibly clunky, laden with missing values and column after column of confusing information. Our tutorial works to tidy this data, as well as to manipulate it and frame it for better understanding and investigation. We hope that after reading our tutorial you walk away with a better understanding of the destruction of WWII, and the sheer amount of bombs that were dropped as a result of the multiple theatres of war. Perhaps with this knowledge, you can hold an engaging conversation with your friends, state a fun fact or two, or even write a paper about the tragedy and horror of human ingenuity. If you already knew all of this information, we hope you enjoyed our visual representations and a short refresher course in destructive history. If this is new to you, rejoice in the fact that we live in a world where this war has ended, and that you need not fear the looming threat of a B17 bomber overhead.

A few things to note

Step 1: Prep the Dataset

Tidy the data

Let's drop some of those unnecessary columns that we won't be using to make the data a bit more readable.

Step 2: Analysis

Now that we have loaded the data let's do some basic analysis! Let's see how many tons of TNT worth of bombs the allies dropped throughout the war. We can do this easily by specifying the column containing the total tons of weaponry dropped and calling the sum on that column.

That is a pretty insane amount of weaponry dropped throughout WW2. Now let's try to recreate that graph on the website we got the data from that showed which countries had the most tons of weaponry dropped on them.

Wow! That is a lot of bombs dropped on Germany. Now let's see who was dropping those bombs. We can do a similar strategy as the last graph except for this time we will group by the country flying the mission instead of the target country.

The USA and Great Britain unsurprisingly did the most dropping of bombs. Now let's find which planes did most of this bomb dropping.

The B17 leads this category having dropped almost 30% of all bombs dropped by the US and Great Britain over the war.

B17

This is impressive but not unexpected as the B17 was one of the most mass-produced and effective bombers of the war. Britannica states that the B17 "was the mainstay of the strategic bombing campaign" for the US.

Now let's try to get a nice overview of the amount of bombing that occurred over time.

Hmm, what is that slight blip we see right before 1941? That seems to be a lot of bombing for so early in the war.

It turns out this occurred in Africa and was related to Northern front, East Africa, 1940. Our data seems to correlate with the Wikipedia article under the section about the British attack on fort Gallabat. Our data shows 6 WELLESLEY bombers attacking a fort with around 5000 tons of TNT worth of bombs. Wikipedia states "An RAF contingent of six Wellesley bombers and nine Gladiator fighters were thought sufficient to overcome the 17 Italian fighters and 32 bombers believed to be in range. The infantry assembled 1–2 mi (2–3 km) from Gallabat, whose garrison was unaware that an attack was coming until the RAF bombed the fort and put the wireless out of action.". An interesting little discovery.

This next section of code works with a slightly more advanced understanding of plotting and data frame manipulation in pandas. The goal of this cell is to identify a list of all aircraft used during WWII, creating and labeling a scatter plot which will be used to display the amount of TNT (in tons) dropped by each aircraft type, throughout the war. We utilize a clever subplot stacking trick to put multiple plots on the same single plane(pun intended). We then loop through our list of planes, for each plane isolating our data frame to only contain those rows that relate to the current plane, and for these rows plotting their data points on our graph.

Upon first view of the above graph, it might seem like it was done incorrectly, and the code does not work the way it is supposed to. This was our initial thought when seeing the clustered results followed by two major outliers far above any other data points. We decided to pull up those particular data points from our data frame to see what was going on, as these data points made no sense. The following code is what we used to learn more about our very large outliers and determine what our issue could be.

After successfully isolating the two outliers we were able to determine that there was, in fact, no bug in our code, but an anomaly regarding the two and only atomic bombs ever used on civilization in all of history. We were surprised to have forgotten these two events, though they immediately made the data clear and understandable. The events, as listed in the data frame correlate to the bombings of Hiroshima) and Nagasaki by the united states, in August of 1945, using the catastrophic destruction of the atomic bomb. Bombs Fat man, and Little Boy were dropped, with Fat man being the larger of the two. The Bombs can be seen Below. (Little Boy seen first, Fat Man seen second). Next to them is the b29, the bomber that carried them.

Fat Man Little Boy b29

Step 3: More in depth visualization

Now let's bring in folium to do some visualizing of what areas were bombed the most. Folium will let us create an interactive heatmap to see what areas were most bombed by the allies throughout the war. We are going to do a frequency heatmap overlayed with circle markers that denote the most intensive explosions (top 10,000). These circles will indicate, not to scale, the amount of damage caused in relation to other strikes. We also will label a few key cities to guide your interpretation.

Step 4: A bit of machine learning (because everything needs machine learning)

Let's say hypothetically you are a person living your life during WW2. We are going to train a machine learning model to predict which type of aircraft would be most likely to drop a bomb on your head. This is very tongue in cheek but an interesting way to learn about some basic machine learning concepts.

Details

We are going to train a Decision Tree model in order to predict aircraft type based on given longitude and latitude. We will split the data into a training set and a testing set, then train the model on the training set. After that, we can use the testing set to test how accurate our model is. This is called holdout validation and is a common technique in machine learning.

This is not an intelligent application of machine learning but it does seem to produce not wildly incorrect results with an accuracy score of 0.58 out of 1. To make a truly accurate model we would need way more data and include far more factors besides just latitude and longitude. A decision tree also might not be the best model for the relationship we are trying to show. Obviously trying to predict what aircraft would bomb a specific location is an impossible task but it is amusing to see what the model guesses for any longitude and latitude you throw at it. One good takeaway lesson to be seen is the accuracy of the model by each plane. Notice how the planes that were less common in the dataset have far lower f1 scores meaning that they were predicted correctly far less than planes that were more common like the B29. Planes like the Beaufighter or the F06 had very few entries so the model really struggled to properly predict them. Also notice how the B17, the most common aircraft, seemed to have an oddly low score for being so common. Compared to another very common bomber like the B29 it is odd that it has such a low f1 score. This is likely because the B17 was used in so many more locations because of its long service span that the model had a hard time finding a proper correlation between location and where the B17 was used.

Now for fun let's try a few tests:

Some of these predictions do seem reasonable. Tokyo for example is logical as the US did not bomb Japan until late into the war and the B29 came into service much later in WW2. Brussels also makes some sense because B17's were mostly deployed over Europe. Not all make complete sense but it is interesting to see how the inputs impact the prediction.

Conclusion

In this tutorial, we set out to inform you, the reader, about the intricacies of the data science pipeline with a fun, informative, and entertaining series of code snippets and explanations, that leave you wanting to explore deeper into the world of data science. We hope the destructive intricacies of WWII bombings were able to keep you interested and engaged with what we had to offer. From tidying/wrangling the expansive amounts of data to data analysis with heatmaps and folium, and even to small-scale machine learning models, we hope you have learned at least some of the basics of what it takes to be a data scientist. The data analysis we have conducted allows us to achieve a better understanding of what sheer destructive power we as human beings are capable of reaching. Despite the somewhat intriguing and perhaps even cool nature of this data... It is important to remember that real people experienced a world in which these numbers were more than just numbers, a cruel reality where life had extreme uncertainty, and nowhere was truly safe. It is our duty as a collective and united species to work together on our problems and to never reach hostilities of any degree similar to that of WWII. Surely this would lead to total annihilation.